Data Clustering using Large p–Median Models and Primal–Dual Variable Neighborhood Search

نویسندگان

  • Pierre Hansen
  • Jack Brimberg
  • Dragan Urošević
  • Nenad Mladenović
چکیده

Data clustering methods have been developed extensively in the data mining literature for detecting useful patterns in large datasets in the form of densely populated regions in a multi–dimensional Euclidean space. One of the challenges in dealing with these large databases is to obtain quality solutions within reasonable CPU time and memory requirements. Earlier partitioning methods such as PAM (1990) become inefficient on these larger sets due to repetitive and lengthy neighborhood searches in the solution space. To get around this problem, CLARA (1990) randomly samples the solution space first, and then solves a clustering problem (p– median) on the much smaller representative sample using PAM. Meanwhile CLARANS (2002) randomly selects neighborhood points up to a maximum number instead of searching the entire neighborhood. BIRCH (1996) uses hierarchical approach to cluster the data in a single pass. Improvements are made at the user’s option by further passes through the (classified) data. The purpose of this paper is to present a new approach for solving the p–median problem based on the variable neighborhood search metaheuristic. Using a highly efficient data structure and interchange updating procedure within the embedded local search we are able to analyze very large multi–dimensional datasets directly and quickly. Another feature of the our method is that a guaranteed bound on solution quality is obtained by solving heuristically a dual relaxation of the problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Zone Pricing in a Location-Routing Problem Using a Variable Neighborhood Search Algorithm

In this paper, we assume a firm tries to determine the optimal price, vehicle route and location of the depot in each zone to maximise its profit. Therefore, in this paper zone pricing is studied which contributes to the literature of location-routing problems (LRP). Zone pricing is one of the most important pricing policies that are prevalently used by many companies. The proposed problem is v...

متن کامل

The Parallel Variable Neighborhood Search for the <Emphasis Type="Italic">p</Emphasis>-Median Problem

The Variable Neighborhood Search (VNS) is a recent metaheuristic that combines series of random and improving local searches based on systematically changed neighborhoods. When a local minimum is reached, a shake procedure performs a random search. This determines a new starting point for running an improving search. The use of interchange moves provides a simple implementation of the VNS algor...

متن کامل

Solving the p-Center problem with Tabu Search and Variable Neighborhood Search

The p-Center problem consists of locating p facilities and assigning clients to them in order to minimize the maximum distance between a client and the facility to which he or she is allocated. In this paper, we present a basic Variable Neighborhood Search and two Tabu Search heuristics for the p-Center problem without the triangle inequality. Both proposed methods use the 1-interchange (or ver...

متن کامل

Some Duality Results in Grey Linear Programming Problem

Different approaches are presented to address the uncertainty of data and appropriate description of uncertain parameters of linear programming models. One of them is to use the grey systems theory in modeling such problem. Especially, recently, grey linear programming has attracted many researchers. In this paper, a kind of linear programming with grey coefficients is discussed. Introducing th...

متن کامل

Solving large p-median problems by a multistage hybrid approach using demand points aggregation and variable neighbourhood search

A hybridisation of a clustering-based technique and of a Variable Neighbourhood Search (VNS) is designed to solve large-scale p-median problems. The approach is based on a multi-stage methodology where learning from previous stages is taken into account when tackling the next stage. Each stage is made up of several subproblems that are solved by a fast procedure to produce good feasible solutio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007